D1.3 decode-kernel + residual composition (Phase 1 scaffold complete, 104/104 tests) by AdaWorldAPI · Pull Request #235 · AdaWorldAPI/lance-graph

AdaWorldAPI · 2026-04-20T23:57:53Z

Summary

Final Phase 1 scaffold deliverable — D1.3 decode-kernel trait + residual composition. Scope corrected after loading the canonical architecture docs (cognitive-shader-architecture.md + ripple-dto-contracts.md + encoding-ecosystem.md): this module sits on the hydration / calibration path, NOT the cascade inference path (which uses p64_bridge::CognitiveShader::cascade with 8 predicate planes × bgz17 O(1) distance — no per-inference codec work).

104/104 cognitive-shader-driver --features serve tests pass (+9 new).

What lands

crates/cognitive-shader-driver/src/decode_kernel.rs — ~280 LOC:

pub trait DecodeKernel: Send + Sync + std::fmt::Debug {
    fn decode(&self, &[u8]) -> Result<Vec<f32>, DecodeError>;
    fn encode(&self, &[f32]) -> Result<Vec<u8>, DecodeError>;
    fn bytes_per_row(&self) -> u32;
    fn dim(&self) -> u32;
    fn signature(&self) -> u64;           // JIT cache key
    fn backend(&self) -> &'static str;     // never "scalar" on SoA
}

pub struct StubDecodeKernel { dim, tag }    // byte-exact round-trip for tests
pub struct ResidualComposer {
    base: Box<dyn DecodeKernel>,
    residual: Box<dyn DecodeKernel>,        // can itself be a ResidualComposer (depth > 1)
}

pub enum DecodeError { SizeMismatch, Stage { stage, detail } }

Composition semantics (matches plan D1.3 spec):

encode(v) = [ base.encode(v) ; residual.encode(v - base.decode(base.encode(v))) ]
decode(enc) = base.decode(enc[..base_b]) + residual.decode(enc[base_b..])

Stages compose recursively — the residual slot is itself Box<dyn DecodeKernel>, so a depth-2 composer has another ResidualComposer in its residual slot. Tests verify byte-exact round-trip through nested-depth-2 all-stub composition (3 stages, 4 dim × 4 bytes × 3 = 48 bytes per row).

Tests (9 new)

stub_round_trip_is_exact
stub_rejects_wrong_input_size
residual_compose_round_trip_is_exact_when_both_stubs
residual_compose_mismatched_dims_rejected
residual_compose_bytes_per_row_sums_stages
residual_compose_nested_depth_two_round_trip
signatures_distinguish_composer_from_stages
signature_depends_on_stage_order (base+residual ≠ residual+base — order is part of identity)
composer_backend_reports_stub_when_any_stage_is_stub (weakest-link reporting)

Scope correction (per loaded orientation)

Before this PR, my framing of "codec kernels" drifted toward treating them as inference-path infrastructure. Reading cognitive-shader-architecture.md lines 582+ made the distinction explicit:

Path	Uses
Cascade inference (Layer 2, per-cycle, ns budget)	`p64_bridge::CognitiveShader::cascade(query, radius, layer_mask)` — 8 predicate planes × bgz17 O(1) palette distance, no Hamming, no POPCNT, table lookup only
Hydration / calibration (one-time per model, seconds-to-minutes)	D1.x codec kernels — decode/encode tested against token-agreement gate; once a codec graduates, it runs at weight ingest (GGUF → palette + Fingerprint<256>), not per-inference

StubDecodeKernel is the test fixture; real decoders (once D1.1b lands the ndarray jitson_cranelift::JitEngine adapter) replace it. The composition pattern remains stable across that transition.

Phase 1 state after merge

D-id	Deliverable	Status
D1.1	`CodecKernelCache<H>` scaffold	✅ Shipped (#233)
D1.1b	Adapter to `ndarray::hpc::jitson_cranelift::JitEngine`	⏳ Queued
D1.2	Rotation primitives (Identity / Hadamard / OPQ stub)	✅ Shipped (#234)
D1.3	Decode-kernel trait + residual composition	✅ This PR

After merge, Phase 1 scaffold is complete. D1.1b (real Cranelift wiring) is the only remaining Phase 1 piece, and it drops Box<dyn DecodeKernel> kernels that wrap ndarray's JitEngine into the StubDecodeKernel slot in ResidualComposer — no composition-layer changes required.

Board hygiene (same commit)

STATUS_BOARD.md — D1.3 Queued → In PR.

Test Plan

cargo test --manifest-path crates/cognitive-shader-driver/Cargo.toml --features serve — 104/104 pass (+9 new)
cargo test -p lance-graph-contract --lib — 147/147 pass (unchanged)
cargo test --manifest-path crates/jc/Cargo.toml — 6/6 pass (JC substrate proof unchanged)
Rules A-F honored at the composition layer (A/B/E/F apply directly; C defers to per-stage kernel backend)

https://claude.ai/code/session_01SbYsmmbPf9YQuYbHZN52Zh

Final Phase 1 scaffold deliverable. Hydration/calibration path, NOT cascade inference path (per cognitive-shader-architecture.md line 582: the cascade uses p64_bridge::CognitiveShader::cascade + 8 predicate planes × bgz17 O(1) distance, no per-inference codec work). crates/cognitive-shader-driver/src/decode_kernel.rs (~280 LOC): DecodeKernel trait — object-safe, Send+Sync+Debug: decode(&self, &[u8]) -> Result<Vec<f32>, DecodeError> encode(&self, &[f32]) -> Result<Vec<u8>, DecodeError> bytes_per_row() -> u32 dim() -> u32 signature() -> u64 # JIT cache key backend() -> &'static str # never "scalar" on SoA StubDecodeKernel { dim, tag } — byte-exact f32 ↔ u8 round-trip via native-endian reinterpret. No quantization, no compression; exists so composition plumbing can be tested without a trained palette. Backend = "stub". Signature hashes "stub_decode" + dim + tag. ResidualComposer { base: Box<dyn DecodeKernel>, residual: Box<dyn DecodeKernel> } Two-stage residual composition: encode(v) = [base.encode(v); residual.encode(v - base.decode(base.encode(v)))] decode(enc) = base.decode(enc[..base_b]) + residual.decode(enc[base_b..]) Nests recursively — residual slot can itself be a ResidualComposer (depth > 1). Rejects mismatched dims at construction. Backend = "stub" if either stage is stub, else base's backend (weakest-link reporting for latency-critical stages). DecodeError { SizeMismatch { expected, actual }, Stage { stage, detail } } Tests (9 new, all under --features serve): - stub_round_trip_is_exact - stub_rejects_wrong_input_size - residual_compose_round_trip_is_exact_when_both_stubs (both stubs = byte-exact; residual all zero; output == input) - residual_compose_mismatched_dims_rejected - residual_compose_bytes_per_row_sums_stages - residual_compose_nested_depth_two_round_trip (ResidualComposer whose residual IS another ResidualComposer — depth=2 encodes 3 stages, still byte-exact when all stubs) - signatures_distinguish_composer_from_stages - signature_depends_on_stage_order (base+residual vs residual+base produce different signatures) - composer_backend_reports_stub_when_any_stage_is_stub Scope clarification (per orientation loaded this session from cognitive-shader-architecture.md + ripple-dto-contracts.md): - D1.x codec kernels = hydration/calibration path - Cascade inference path = p64_bridge::CognitiveShader at L2 - Real kernels replace StubDecodeKernel once D1.1b lands the ndarray::hpc::jitson_cranelift::JitEngine adapter Board hygiene (CLAUDE.md Mandatory rule): STATUS_BOARD.md D1.3 Queued → In PR Rules honored: Rule A — in-place &mut operations via Vec; no manual index math Rule B — no std::arch / no hpc::simd_avxNNN reach Rule C — n/a at the composition layer (real kernel backend selection defers to D1.1b per-stage) Rule D — codec params come from CodecParams via Wire DTOs (D0.1-D0.3) Rule E — trait methods expose signature + backend + bytes_per_row Rule F — no serialization between stages; Vec<f32>/Vec<u8> owned https://claude.ai/code/session_01SbYsmmbPf9YQuYbHZN52Zh

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 3f58967902

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-04-21T00:01:16Z

+        for (dst, &r) in out.iter_mut().zip(&residual_v) {
+            *dst += r;


Reject stage decode length mismatches before summing

ResidualComposer::decode adds stage outputs with zip, which truncates to the shorter vector. If either stage returns fewer than dim() elements, this returns Ok(...) with silently corrupted output instead of surfacing an error, even though the trait contract says decode should produce full-dimension vectors. Add explicit length checks for both decoded stage vectors before the accumulation loop so malformed/buggy stage implementations fail fast.

Useful? React with 👍 / 👎.

chatgpt-codex-connector Bot reviewed Apr 21, 2026

View reviewed changes

AdaWorldAPI merged commit 6bed7ae into main Apr 21, 2026
0 of 6 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

D1.3 decode-kernel + residual composition (Phase 1 scaffold complete, 104/104 tests)#235

D1.3 decode-kernel + residual composition (Phase 1 scaffold complete, 104/104 tests)#235
AdaWorldAPI merged 1 commit into
mainfrom
claude/teleport-session-setup-wMZfb

AdaWorldAPI commented Apr 20, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Apr 21, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		for (dst, &r) in out.iter_mut().zip(&residual_v) {
		*dst += r;

Conversation

AdaWorldAPI commented Apr 20, 2026

Summary

What lands

Tests (9 new)

Scope correction (per loaded orientation)

Phase 1 state after merge

Board hygiene (same commit)

Test Plan

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Apr 21, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants